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One of the most important problems in chemical analysis is the interpretation of analytical data. The difficulty of this 
task has been further compounded by the data explosion. Chemical information relevant to the particular analysis 
problem is hidden within excessive amounts of data. This problem could be alleviated through knowledge and control 
of the information content of the data. Information theory provides a means for the definition, evaluation, and 
manipulation of quantitative information content measurements. This paper provides a general review of some of the 
basic concepts in information theory, including history, terminology, entropy, and other information content measures. 
The application of information theory to chemical problems requires some modifications. The analyst is usually only 
interested in a subset of the information (data) which has been collected. Also, this relevant chemical information is 
dependent upon not only the informational goals of the problem, but the completely specified procedure as well. This 
paper reviews chemical applications of information theory which have been reported in the literature including applica- 
tions to qualitative analysis, quantitative analysis, structural analysis, and analytical techniques. Measures of informa- 
tion and information content and figures of merit for performance evaluations are discussed. The paper concludes with 
a detailed discussion of the application of information theory to electrochemical experiments and the empirical determi- 
nation of the information content of electroanalytical data. 
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Introduction 

Data interpretation is one of the most challenging prob- 
lems of chemical analysis. Both the data-rich and the data- 
limited cases stress the need for efficient methods to extract 
chemical information from the available data. Data-rich 
analyses result from the ability of modern chemical instru- 
mentation to generate enormous amounts of data in short 
periods of time. The current trend towards more exotic 
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hybrid instruments buries the chemical information even 
deeper within the data. Alternatively, data-limited analyses 
often result from limitations in appropriate sensors, accessi- 
ble techniques, time, and manpower. The need for efficient 
methods to extract chemical information is superseded only 
by the need to acquire information-rich data. 

Improving accessibility of chemical information empha- 
sizes the importance of good experimental design and re- 
quires a re-evaluation of the traditional approach to chemi- 
cal analysis. The typical approach involves using 
foreknowledge about the samples to choose an anlytical 
procedure. The analytical procedure, which may involve 
more than one analytical technique, is used to produce as 
much data as possible which is collected for later analysis. 
Data interpretation is performed by the analyst using as 
much intuition and background knowledge as possible. In 
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the data-rich case, emphasis is often on data reduction. The 
data-limited case emphasizes information extraction. 

A more efficient and desirable approach to chemical anal- 
ysis would be to maximize the amount of information ob- 
tained relevant to the current problem, while minimizing the 
amount of analytical effort, time, and collected data. The 
evaluation, selection, and optimization of analytical proce- 
dures need to be investigated further, In order to study this 
problem, a method for the quantification of chemical infor- 
mation must be defined, This paper reviews some relevant 
information theory concepts and describes applications to 
chemical analysis. Illustrations have been taken from the 
literature, as well as from our own recent work, to demon- 
strate the value of applied information theory for optimized 
chemical analysis methods. 



N is the number of events, when considering a set of mutu- 
ally exclusive but equally probable events [1,6,7]. For ex- 
ample, consider the measurement of three distinguishable 
intensity levels. If the probabilities of measuring these lev- 
els are 0.25, 0.30, and 0.45, respectively, the averge en- 
tropy would equal 1.5 bits. The amount of specific informa- 
tion conveyed by the measurement of each level would 
equal 2.0, 1.7, and l.l bits, respectively. Notice that the 
least likely Level to be measured does indeed convey the 
most information. The maximum entropy is equal to 1,6 
bits. A thorough treatment of the mathematical basis of 
entropy and its properties is given in a book by Mathai and 
Rathie [2]. 

Redundancy [6] is the difference between the maximum 
information and the average information eq (2), Relative 



Information Theory Concepts 

Information theory [1-4]' is concerned with the study of 
information and its transmission. The earliest information 
theorists studied the encoding and decoding of secret codes. 
Modern work on information theory can be traced back to 
the 1920's beginning with Carson's study of frequency 
modulation. Nyquist determined the minimum bandwidth 
required for the transmission of a definite quantity of infor- 
mation. Hartley established a definite bandwidth-time 
product required for the tranmission of a definite quantity of 
information. However, most of the present work in informa- 
tion theory is based upon probablistic models of communi- 
cation developed by C.E. Shannon in 1948 [5]. Simply 
stated, the basic principle is that a message with a high 
probability of random occurrence conveys little informa- 
don. The most information is conveyed by the message that 
Is least likely to spontaneously occur. 

This principle is formalized by the concept of entropy 
which equates information and uncertainty. Entropy is a 
quantitative measure of the amount of information supplied 
by a probabilistic experiment. It is based upon classical 
Boltzmann entropy from statistical physics. Shannon's for- 
mula, eq (1) defines the entropy or the average information 
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to be equal to the weighted average of the specific informa- 
tion for each event in the system under consideration. 
Specific information [6] is information conveyed by the 
occurrence of a particular event and is quantified by the 
-log 2 of the probability of the event (p (*,)). Entropy is 
limited by a maximum of log 2 N (Hartley's equation), where 



'Figures in brackets indicate literature references. 



redundance is the ratio of the redundance to the maximum 
information [6], Relative information content is the ratio of 
the actual average information to the maximum information 
[1J. Redundancy can then be expressed as the remaining 
fraction not due to relative information [1]. In the above 
example there is 0.1 bit of redundancy and 0.062 relative 
redundance. If the actual average information is equal to 1 .4 
bits, the relative information content is equal to 0.88 and the 
redundancy equals 0.12. 



Types of Information 

The concept of information as used in information theory 
refers to the choice or uncertainty of outcomes when regard- 
ing distinguishable elements with respect to some random 
mechanism. Information is a system based upon elements 
that have been agreed upon for the representation of infor- 
mation (characters) and the relationships between them 
(codes). It is not a measure of meaning as used in the usual 
sense, which implies a subjective evaluation. The concept 
of information as used in chemical analysis encompasses the 
uncertainty regarding the quantity, identity, chemical struc- 
ture, or properties of the analyte of interest. 

Preinformation or foreknowledge [1] is the prior knowl- 
edge concerning the occurrence of events. The information 
conveyed by the occurrence of more than one independent 
event is simply the sum of the information conveyed by each 
event individually. However, if the occurrence of a second 
event is dependent upon the occurrence of the first event, the 
foreknowledge of the first event reduces the amount of in- 
formation conveyed by the second event. Therefore, the 
amount of information conveyed by a series of events within 
the system under consideration is always less than or equal 
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to the sum of the information conveyed by each of the events 
separately. 

Preinformation is probably the most commonly used in- 
formation theory concept in chemical analysis. Chemical 
preinformation [8] is information that is known prior to 
performing the current analysis. It may result from experi- 
ence, preliminary analyses, etc. It is used to reduce the 
effort required to solve the analytical problem, Preinforma- 
tion may he quantified through the use of entropy-based 
measures [8], The uncertainty before the analysis for a dis- 
crete variable, such as chemical identity, is quantified by the 
use of the a priori probabilities for identification in Shan- 
non's equation (eq 1). The uncertainty for a continuous 
variable such as concentration or signal intensity is ex- 
pressed by integrating the a priori probability density func- 
tion over the range of interest eq. (3). 



H(X)=-j 2 p(x)\og 2 p(x)dx 



(3) 



Joint information [1] is information that is provided by 
more than one event. It can be quantified by substituting the 
joint probability of occurrence for the events into Shannon's 
equation (eq L). In the case of independent events, the joint 
probability of occurrence is simply the product of the a 
priori probabilities of occurrence. For nonindependent 
events, it is the product of the a priori probability for the 
first event with the conditional probabilities for the other 
events. 

Mutual information describes the amount of information 
in one event that determines the state of another event. It 
may also be thought of as the average amount of information 
required to distinguish members of different classes. Isen- 
hour et al. [9] investigated the relationship between mutual 
information and classification in the determination of chem- 
ical functionality for 200 compounds based upon binary 
encoded (peak/no peak) infrared spectra. Mutual informa- 
tion was calculated as the difference between the total aver- 
age entropy and the average conditional entropy. The total 
average entropy is the average amount of information re- 
quired to distinguish between the 200 spectra under consid- 
eration. It is calculated as a weighted average of the proba- 
bility of occurrence of a peak maximum for each spectral 
interval using Shannon's formula (eq 1). The average condi- 
tional entropy is the average amount of information required 
to distinguish between members of the same class. Average 
conditional entropy is calculated as the sum of the class 
conditional entropies weighted by class size. In the case of 
two separable, equally probable classes, the independent 
mutual information is equal to one bit. A value for mutual 
information greater than one bit implies the inclusion of 
redundant information in the data. The square root of mutual 



information was shown to be linearly related to the maxi- 
mum likelihood classification ability. 



Figures of Merit 

The application of information theory concepts to chem- 
istry is most familiar in the evaluation of analytical methods. 
Figures of merit such as accuracy, precision, and detection 
limit have long been used to evaluate the attainment of 
informational goals such as concentration, resolution, and 
sensitivity, respectively. Figures of merit are measures of 
goal achievement for completely specified procedures that 
can be used for evaluating, selecting, and comparing analyt- 
ical procedures. Other quantifiable factors that can be used 
to determine the applicability of analytical procedures to a 
particular problem include sensitivity, selectivity, speed of 
analysis, personnel requirements, and cost of the analytical 
procedure. 

Grys [10] described five new functional concepts: accu- 
racy, limit of detection, firmness, efficiency, and cost, 
which result in judgements of acceptability of analytical 
methods. Accuracy is expressed in terms of recovery and 
reproducibility. The contribution due to recovery can be 
calculated by summing the percentage of different losses 
throughout the whole procedure. Reproducibility is ex- 
pressed by the ratio of the full range to 100 times the ideal 
range. 

Limit of detection is the concentration of a sample that 
gives a reading that is equal to twice the confidence half- 
interval for a series of ten determinations of the blank test 
value determined to 99% certainty- It is measured in mg per 
kg, or ppm, and is given by multiplying the standard devi- 
ation by 2. 17, a factor that is determined from the t test for 
( 00 i and « = 10. 

Firmness is an index of the effects of different factors 
upon the results. It is equal to the total deviation from 
expected values caused by the presence of equimolar 
amounts of interfering substances or connected with 5% 
changes in optimum reaction conditions such as acidity or 
reagent concentrations. 

Efficiency provides information about the time consump- 
tion during the course of the whole procedure. It is ex- 
pressed as the time of effective labor for one sample in 
minutes divided by 100, 

Cost is a measure of the expenditure of materials and 
equipment used for the analysis of one sample by a new 
method in relation to the least expensive method. The cost 
of any desired method by which the analysis can be per- 
formed may be substituted for that of the least expensive 
method. It is given by dividing the ratio of the cost of the 
new method to the old method by 1000. 

Eckschlager [11] discussed two informational variables, 
time-information performance and specific price of rnforma- 
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tion, which can be utilized in the evaluation and optimiza- 
tion of analytical methods. Time-information performance 
[12] can be rewritten as the ratio of information content to 
the time required for the analysis, including the analysis 
itself plus the time required to prepare the equipment for the 
analysis of the next sample. The time required for analysis 
can be partitioned into two segments, the basis time and the, 
time required for the performance of jV parallel determina- 
tions. The specific price of the information [11] is defined 
as the ratio of the cost of the analysis to the amount of 
information obtained through simultaneous determination of 
N components. 

Danzer and Eckschlager [13] defined a general measure 
of information efficiency as the product of efficiency coeffi- 
cients that are based upon the ratio of the value of the 
variable characterizing the properties required for the solu- 
tion of particular analytical assignments to the actual value 
that the method provides. A ratio greater than one implies 
that more of the property is required than is provided by the 
method and the efficiency coefficient is assigned the value 
of zero. Otherwise, the efficiency coefficient is assigned the 
value of the ratio. They also defined a measure of informa- 
tion profitability as the ratio of the information efficiency to 
the specific price of the information. Results for the determi- 
nation of manganese in low-alloy steels were tabulated for 
seven analytical methods: titrimetry, potentiometric titra- 
tion, photometry, atomic absorption spectrophotometry, op- 
tical emission spectroscopy, optical emission spectrogra- 
phy, and optical emission spectrometry. The results 
demonstrated that information efficiency and information 
profitability are not always correlated. For example, al 7 
though potentiometric titration has almost five times the 
information efficiency of titrimetry, both methods have the 
same information profitability when the duration of the anal- 
ysis should be less than one day. 



Informing Power 

Informing power is a measure of the amount of informa- 
tion available in a given analytical procedure. The concept 
was developed by Kaiser [14] with respect to spectrochem- 
ical methods of analysis eq (4) as a function of the resolving 
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power, R(v), the maximal number of discernable steps for 
the amplitude, S(v), and the spectral range, v a to v b . If the 
resolving power and the maximal number of steps are fairly 
constant over the spectral range under consideration, in- 
forming power reduces to eq (5). For example, a grating 



spectrograph system with a resolving power of 2X10 5 , a 
spectral range from 2000 to 8000 A, and 100 discernable 
steps in measured intensity levels at each wavelength would 
have an informing power of 2x 10 6 bits. It is obvious that 
here the resolving power is the most important factor in 
maximizing informing power. In the case of a nondisper- 
sive, monochromatic method, resolving power between 
peaks at different wavelengths is not applicable and inform- 
ing power is simply the log 2 of the number of discernable 
amplitude steps at that wavelength. For example, 100 dis- 
cernable intensity steps yields an informing power of 7 bits. 
The informing power for the corresponding polychromatic 
method is that for the monochromatic method multiplied by 
the number of frequencies. If the number of steps is different 
for each of the different frequencies, then the informing 
power is the same as for a collection of monochromatic 
methods, and the log 2 of the number of steps is summed 
over each of the different frequencies. 

Fitzgerald and Winefordner [15] extended the application 
of informing power to time-resolved spectrometric systems 
with the addition of a second resolving parameter, R, . If 
both resolving powers and number of discernable steps are 
nearly constant over the range, then informing power re- 
duces to eq (5) multiplied by R, In (t 2 !h)- For example, an 
atomic fluorescence spectrometer with an average resolving 
power of 3000 over a spectral range from 200 to 500 nm 
.with an averge of 200 discernable intensity steps has an 
informing power of 5.7 x 10 4 bits. The informing power is 
increased to 9.7X 10 6 bits for a range of 10~ 9 (fj) to 10~ 6 sec 
with a measurement time limited by the lifetimes of the 
excited species of 10 -7 sec (t 2 ) and a 8/ of 10" 9 sec (R t 
equals (t 2 -t{)/bt). Comparisons of the informing power for 
a single beam molecular absorption spectrophotometer, nor- 
mal molecular absorption phosphorimetry and time- 
resolved phosphorimetry showed an increase in the inform- 
ing power by a factor of two for the normal phosphorimeter 
over the spectrophotometer. The addition of a time resolu- 
tion element to phosphorimetry increased informing power 
by a factor of 450. The addition of a time resolution element 
to atomic fluorescence spectrometry increased the informing 
power by a factor of 170. Informing power was also used to 
compare analytical methods as well as to compare analytical 
instruments. A general photon counter was shown to have 
an informing power three times larger than that for an analog 
synchronous detection system. 

Yost extended the application of informing power to 
tandem mass spectrometry [16,17], a method capable of 
generating enormous amounts of data. In the case of a 
quadrupole mass filter, the minimum resolution element, &c 
is constant rather than the resolving power, R (x). Informing 
power can then be calculated as shown in eq (6). A quad- 
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rupole mass spectrometer with unit mass resolution, a mass 
range of 1000, and an ion intensity range of 2 12 bits would 
have an informing power of 1.2X 10 4 bits. The addition of 
another resolution element produces a double integral in the 
informing power equation that is equivalent to the the 
product of the informing power of the two elements. The 
addition of a capillary gas chromatograph with a nearly 
constant 10 5 theoretical plate resolution over a one hour 
analysis time, results in 6.6X10 6 bits of informing power. 
The addition of a second quadrupole mass spectrometer with 
the same characteristics results in 1 . 2 x 1 7 bits of informing 
power. The combination of a capillary gc/ms/ms system 
results in an informing power of 6.6x 10 9 bits, an increase 
by a factor of 5.5 X 10 5 over the original quadrupole mass 
spectrometer. The effect of experimental parameters on in- 
forming power was also demonstrated by Yost [16,17]. The 
variables associated with the collisionally activated dissoci- 
ation process are potential resolution elements. Energy- and 
pressure- resolved ms/ms has an informing power of 
3.6xl0 9 bits. 

The informing power metric can also be applied to elec- 
trochemistry. Using a current range of —20 to +20 uA that 
can be measured to within .005 u-A yields 4xl0 3 discern- 
able steps. Eq (6) can be used to calculate the informing 
power for a cyclic staircase voltammetry (CSCV) experi- 
ment in which each current pulse is sampled and analyzed. 
An experiment with a staircase step of 13.5 mV (fix) and a 
potential range scanned from 0.0 to —1.73 V yields 
3. 1 X 10 3 bits of information. 

Boudreau and Perone [18] demonstrated quantitative res- 
olution in programmed potential step voltammetry for over- 
lapped peaks with 30 mV separation between half wave 
potentials. If only resolved peaks are analyzed and the 
smallest resolution is 30 mV, the informing power is 
1.4 X10 3 bits. The addition of a time resolution element 
increases the informing power for electrochemical methods. 
Taking 45 equally spaced current measurements on each 
step at a sweep rate of 1 .00 V/sec for the CSCV experiment 
increases the amount of information to 3,5 xlO 7 bits. The 
amount of information obtained from CSCV experiments 
can be easily manipulated by changing or adding resolution 
parameters. 

Informing power can be used as a figure of merit for a 
completely specified method or system. Although the in- 
forming power of instrumental techniques may seem exces- 
sive when compared to the maximum information as calcu- 
lated by Hartley's formula, it must be remembered that 
informing power is simply a measure of the maximal num- 
ber of bits of information available in the procedure, not 
necessarily the useable or necessary amount of information. 



Limitations in informing power arise from differences be- 
tween practical and calculated resolving power. The lack of 
independence between the bits of information, noise, and 
interference result in the reduction of the useful informing 
power. Informing power can be partitioned into the amount 
of information required for the solution of the problem and 
the amount of redundant information required to provide a 
given level of confidence. 



Information Content 

One of the most important concepts in information theory 
is that of informational gain or information content [19]. 
This is equal to the change in entropy due to the experi- 
ment and is quantified by the difference between the en- 
tropy using a priori probabilities and the entropy using a 
posteriori probabilities eq (7). The use of Shannon's 
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formula eq (1) to calculate the entropy does not guarantee a 
non-negative information content. However, another infor- 
mational measure eq (8) always results in non-negative 
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values. For equal a priori probabilities, information content 
as calculated by Eqs (7,8) are equivalent. Since information 
content as discussed above can only be established after the 
analysis, these measures cannot be used as a quality crite- 
rion for selecting an analytical procedure. However, they 
can be used to evaluate the performance of a procedure. 

Measures of information content has been applied to in- 
formation theory models of structural analysis, qualitative 
analysis, quantitative analysis, trace analysis and instrumen- 
tal analysis. Eckschlager and Stepanek have published a 
book [7] and a review article [8] on the application of infor- 
mation theory to analytical chemistry. 



Structural Analysis 

One of the most difficult analytical tasks is the unambigu- 
ous determination of chemical structure. However, applica- 
tion of information theory to structural analysis is based 
upon a relatively simple entropy model [8] and an informa- 
tional measure introduced by Brillouin [20], The input con- 
sists of a finite number, « , of equally probable identities 
such as functional groups or conformational arrangements. 
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The output is a portion of a signal that corresponds to the 
identity, such as an IR band, NMR peak, or MS m/z peak, 
encoded only as to its presence or absence. The number of 
possible, but as yet undistinguished, structural arrangements 
is « , where 1 <n <n and n = 1 for an unambiguous determi- 
nation of the structure for the analyte. 

The uncertainty prior to analysis can be expressed by 
substituting the a priori probabilities into Shannon's equa- 
tion eq (1). Since the a priori probabilities are qual, that is 
l/n , then the situation reduces to the case of maximum 
information (Hartley's equation) and the uncertainty is equal 
to log 2 n - The uncertainty after analysis is equal to log 2 n . 
The decrease in uncertainty due to the analysis corresponds 
to the informational gain eq (7) and is equal to Iog 2 (V*)- 
It assumes its maximum value in the case of the unambigu- 
ous determination of the structure and is equal to log 2 no- 



Qualitative Analysis 

The input for the model of qualitative analysis consists of 
a set of discrete identities, X lt where i = l,2,. . .n . If the 
output consists of a number of discrete, equally likely iden- 
tities, the limiting case of Shannon's equation (Hartley's 
equation) can be used to calculate the entropy, and the 
information gained can be expressed by a ratio of the num- 
ber of possible components before and after the analyses 
e q (9). [7]. For example, consider the case of an addition of 



I(p,Po)=\og 2 (V) 



(9) 



HC1 to a solution that contains only one of a list of 25 
possible cations, three of which can be precipitated by HC1. 
The information gained as evidenced by a precipitate would 
be equal to log 2 (25/3) or 3 bits. The lack of a precipitate 
would imply an informational gain of only 0.2 bits. To 
consider various combinations of components, the total 
number of possible combinations is given by eq 10, where 
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M is the total number of components, and is divided into 
groups of m components. In the case of a solution that 
contains from one to six cations, the total number of combi- 
nations is equal to 53. If two of them can be precipitated by 
HC1, there are 15 combinations in which neither cation is 
present and the information gained is 1.8 bits. For the 38 
remaining cases in which either one or both cation is 
present, as evidenced by the appearance of a precipitate, an 
informational gain of 0.5 bit results. 



In the case of instrumental or chromatographic qualitative 
or identification analyses, the output is a set of discrete 
signals in positions Y,, where j = l,2,. . -m and m>« [8]. 
The entropy can be expressed by Shannon's formula eq (1) 
and reaches a maximum when all of the possible identities, 
Xi, are equally likely. It is equal to zero when one identity 
is confirmed and the others are excluded, as would be the 
case for the a posteriori entropy for an unambiguous identi- 
fication. The interpretation of these signals leads to an input- 
output relationship for the system that is represented by a set 
of a posteriori conditional probabilities, p{x t \yj). The inter- 
pretation of these signals is also dependent upon preinfor- 
mation represented by the a priori probabilities that may be 
calculated by Bayes theorem eq (11) [19]. 
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The information content of an analytical signal is defined 
as the decrease in uncertainty eq (7). In the case of unam- 
biguous determinations, H(X\Y)=0 andI(X\Y)=H(X), and 
H(X) is also considered as the information required for 
unambiguous determination. However, most qualitative or 
identification procedures are chosen so as to minimize the 
uncertainty in identification for every possible signal. This 
is quantified by the informational measure of equivocation. 
Equivocation [8,19] is a measure of the expected or average 
value of the uncertainty after analysis eq (12). Equivocation 



E=H(X\Y)=^ p(yj) H(X\ yj ) 
J 



(12) 



and information content are complementary quantities, their 
sum equaling the entropy of the identification procedure. 
For an "ideal" procedure or an unambiguous determination, 
equivocation equals zero and information equals entropy. 

Cleij and Dijkstra [21] demonstrated the use of informa- 
tion content and equivocation in the evaluation of thin-layer 
chromatographic procedures. Information content and 
equivocation were calculated for the identification of DDT 
and 12 related compounds for 33 different TLC systems. 
Calculations of the equivocation for the combinations of two 
TLC systems showed that the best combinations are not 
produced by combinations of the best individual TLC sys- 
tems. This reflects the correlation between the best individ- 
ual TLC systems. 

Another method for quantifying information content is 
from the perspective of the possible signals rather than the 
possible identities [19]. Signal entropy, H(Y), is the uncer- 
tainty in the identity of the unknown signal and can be 
quantified by substituting the probabilities of measuring the 
signals into Shannon's equation, eq (1). The conditional 
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entropy, H(Y\xi), is the uncertainty in the signal if the com- 
pound is known to be x t . It can be considered a measure of 
noise and is given by substituting the conditional probabili- 
ties into Shannon's equation eq (1). Expected values for 
entropy and information content can be expressed in a man- 
ner analogous to that shown above. Also, since entropy is 
strong-additive [2,19], information content can be ex- 
pressed in terms of the signal entropies, 

Dupuis and coworkers [22,23] applied these methods to 
gas-liquid chromatography. The information content for 10 
stationary phases used in gas-liquid chromatography was 
calculated on the basis of compound identification by re- 
trieval of retention indices from a compiled library for a set 
of 248 compounds [22]. The information content per 
column ranged from 6.5 to 7.0 bits. The information content 
for combinations of columns is dependent upon both the 
number of columns and the sequence of columns. Ten se- 
quences of the 10 columns yielded an information content of 
43.3 bits. The study was expanded to include 16 gas-liquid 
chromatography stationary phases [23] . The complete data 
set of 248 compounds, a subset of 48 aliphatic alcohols, a 
subset of 35 aldehydes/ketones, and a subset of 60 esters 
were explored. For all four sets of compounds, combina- 
tions of stationary phases that yielded the most information 
consisted of one nonpolar phase plus one or more polar 
phases. 

Van Marlen and Dijkstra [24] calculated the information 
content for the identification of binary coded mass spectra 
by retrieval and determined the optimal sequence of masses 
which contained the most information. A set of approxi- 
mately 10,000 low resolution mass spectra were binary en- 
coded using a threshold intensity level of 1% of the intensity 
of the base peak, Masses greater than 300 did not yield any 
additional information. Individually, masses of 300 or less 
contained from zero (m/z=3,4,5,6,7) to one 
(01/3=29,40,51,53,57,69,77) bit of information. The opti- 
mal mass sequence of 120 masses contained 40.9 bits of 
information, demonstrating the obvious redundancy in the 
binary coded spectra. The optimal mass sequence is depen- 
dent upon the distribution of the peaks. A set of 200 binary 
coded alkane spectra yielded 9 bits of information for 25 
selected masses. 



Quantitative Analysis 

The model for quantitative analysis is a two-stage model 
[8]. The input is a continuous distribution that produces a 
continuously variable signal. In the second stage, the signal 
specified by both position and intensity is decoded into 
results. The distribution of the results for parallel determina- 
tions is generally normal. Preinformation indicates that con- 
tent of the component lies within a specified range of x to 
x t so the a priori probability density is that of a rectangular 



or uniform distribution. The information content is consid- 
ered a divergence information measure that represents the 
error term in the measurement of the inaccuracy of the 
preinformation [7,8], If the results confirm the a priori 
assumptions for the component, the information content is 
given by eq (13). The effect of a systematic error, 



I (p.po)=l«g2 [(xi-xo)/(o^ 



(13) 



S, reduces the information content by factor of 8 2 /2o 2 [7]. 
The use of parallel determinations, n p , and the estimate of 
o-, s, results in eq (14), which utilizes the student's t test at 



/(P.J5 ) = lOg 2 [(jCl-*o)Hp/(2st(v)] 



(14) 



the significance level of 0.038794. This level of signifi- 
ca nce is chosen so that twice the t value at infinity equals 
V2ire as the number of degrees of freedom approaches 
infinity. The information content as measured by eq (14) is 
the practical form of eq (13) since usually only the estimate 
of the standard deviation is known. 

Poisson distributed results can be approximated by a nor- 
mal distribution with the population mean, u., equal to the 
constant representing the average number of random points 
per unit time, X, and the population standard deviation, o, 
equal to V2. This changes the equation for information 
content to that shown in eq (15) [25]. However, this 

/(p,p )=log 2 [(*, -* )/(V2^)] (15) 

approximation is less valid for small values of lambda. 



Trace Analysis 

The model for trace analysis [7,8] is essentially the same 
as for quantitative analysis except that the output signal is 
often barely distinguishable from the background noise, In 
the first case, the information content of the component to 
be determined is less than or equal to the detection limit of 
the analytical method and the only conclusion is that the 
content is somewhere between zero and the detection limit. 
The a posteriori probability distribution is equal to the in- 
verse of the detection limit of the method. The information 
content is given as the log 2 of the ratio of the highest esti- 
mated content of the component to the detection limit for the 
component, In the second case where the content of the 
component to be determined is greater than the detection 
limit, the content can be determined quantitatively. The a 
posteriori probability distribution is a shifted log-normal, 
distribution. This information content differs from the infor- 
mation content of the first case by the addition of log 2 
{\ r n p lku2'ne] term, where n p is the number of parallel 
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determinations and k is the asymmetry parameter for the 
shifted log-normal distribution. If the mean value for the 
determination is close to the detection limit, a truncated 
Gaussian distribution is used to describe the a posteriori 
distribution. The information content can be calculated as a 
function of the highest estimated value, the mean value, the 
standard deviation, the frequency, and the distribution func- 
tion. The information content for both the log-normal and 
the trun cated Gaussian distributions converge to log 2 [X]l 
dv2/ne]. 

Electrochemistry 

Perone and coworkers have examined the effects of vari- 
ous experimental parameters on the qualitative information 
content of electrochemical data [26-31]. Because pattern 
recognition methods were used to obtain structural classifi- 
cation information empirically, an empirical measure of in- 
formation gain was used to assess the effects of experimen- 
tal parameters. This involved defining an appropriate figure 
of merit with which to measure the extent of informational 
goal achievement. Changes in information content are then 
determined by observing changes in attainment of the infor- 
mational goal. 

Byers, Freiser, and Perone [28,29] analyzed 45 com- 
pounds using cyclic staircase voltammetry. The data set 
consisted of 19 nitrobenzenes, 9 nitrodiphenyl ethers, and 
17 ortho- hydroxy azo compounds. Of the nitrodiphenyl 
ethers, 4 were strong herbicides and 5 were either weak or 
nonherbicides. The informational goals for the problem 
were the classification of the 45 compounds by their struc- 
tural type and the classification of the 9 nitrodiphenyl ethers 
according to herbicidal activity. Seven experimental vari- 
ables, percent ethanol in the solvent, pH, surfactant concen- 
tration, number of cycles, scan rate, mercury drop hang 
time, and sampling time were varied in a fractional factorial 
design to generate a complete data base of collected cyclic 
staircase voltammograms and cyclic differential capacity 
curves [28] for subsequent analysis [29]. Faradaic and ca- 
pacitive variable effect curves were calculated from the 
data. The average entropy for the three classes was 1.5 bits. 
The maximum entropy was 1.6 bits. The figure of merit for 
the informational goals was the percent correct classification 
as achieved by & -nearest neighbor analysis. For the struc- 
tural characterization studies, the best overall percent classi- 
fication ranged from 76% using capacitive variable effect 
features to 93% for both capacitive variable effect features 
and faradaic variable effect features. Overall accuracy for 
structural classification using the faradaic variable effect 
curve features ranged from 67% for percent ethanol to 93% 
for number of cycles. For the herbicidal prediction using 
variable effect curve features, the percent correct classifica- 
tion ranged from 78% for pH, faradaic sampling time, ca- 



pacitive scan rate, and drop hang time to 100% for % 
ethanol, surfactant, faradaic number of cycles, and scan 
rate. 

Barnes and Perone [30] studied the enhancement of 
chemical process information through experimental design. 
A simple model of controlled potential electrochemical 
processes based upon the Cottrell equation [31] was devel- 
oped and implemented. The informational goal was to deter- 
mine the effects of input voltage sequence, data collection 
fraction to analyze, and preprocessing scheme upon the 
determination of the diffusion coefficient. The figure of 
merit for goal achievement was based upon the least squares 
criterion function. Three input voltages, 180 mV step, pseu- 
dorandom binary sequence with peak voltage, (E-E ), of 
180 mV, and white gaussian noise with mean of 90 mV and 
variance of 3 mV, were presented to the model. Compari- 
sons of the identification results for the three inputs showed 
that the diffusion current model is fairly insensitive to the 
input sequence. Closer inspection of the model reveals that 
the anodic current is overwhelmed by the charging current. 
Therefore, the best input sequence for the model is that 
which is most easily generated. 

In order to investigate the effects of timing, 3,000 data 
points corresponding to three milliseconds in time were 
generated using a step input. The least squares identifier was 
applied to 1 msec intervals of data both with and without the 
removal of charging current effects. When the charging 
current was present, the best results were obtained for the 
analysis of data taken after 10 time constants of the charging 
network. With the charging current removed, the best re- 
sults were obtained with data taken within the first 1 msec 
interval when the amplitude of the anodic current and the 
signal to noise ratio are maximized. The best preprocessing 
scheme included filtering of unneccessary measurement 
components from the signal of interest, such as the removal 
of the charging current and signal averaging. 

The application of information theory concepts to analyt- 
ical chemistry can illuminate methods to increase the effi- 
ciency of chemical analysis. Early work shows encouraging 
promise for these types of applications. Optimum conditions 
have been established for obtaining structural, herbicidal 
activity, and diffusion coefficient information from voltam- 
metric data. It hs been demonstrated that the informational 
goal(s) will dictate the most favorable choice of experimen- 
tal conditions. The use of objective systematic information 
enhancement methods can highlight experimental parame- 
ters that are often traditionally overlooked. 
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The Shannon theory of information has had a profound 
impact in science and technology. Shannon defined infor- 
mation in terms of the reduction of uncertainty which, in 
turn, was measured by entropy. He was concerned mainly 
with the use of information to measure the ability to transmit 
data through noisy channels, i.e., channel capacity. 

Statisticians have developed other, somewhat related, no- 
tions of information. In statistical theory, the major empha- 
sis has been on how well experimental data help to achieve 
the goals in the classical statistical problems of estimation 
and hypothesis testing. These measures serve two useful 
functions. They serve to set a standard for methods of data 
analysis, methods whose efficiencies are measured in terms 
of the proportion of the available information that is effec- 
tively used. They also serve to design efficient experiments. 



For the problem of estimation, Fisher introduced the 
Fisher Information which we now define. Suppose that it is 
desired to estimate a parameter 8 using the result of an 
experiment which yields the data X with the density f(x\Q). 
The Fisher Information for 9 corresponding to X is given by 
the matrix 

J=4(6)=£ e (YY J ) (1) 

where Y is the score function defined by 

Y=Y(X,6)= %r (2) 

If 8 is a multidimensional vector, / is a nonnegative definite 
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